First session:
- Not heaps of paleo-specific R
- But building blocks to make you an expeRt
- Things that go into R (data inputs)
- How to structure your data inputs and outputs
- Getting started in R
26/06/2018
Pros and cons of the following:
githubScreenshot
Three principals:
note that we separate units of metadata with a "_" and within units, with a "-".
This applies to scripts and data
e.g. function_clean-italics-tilia.R
example data screenshot
rstudio
getwd()
## [1] "/Users/oliviaburge/Documents/paleo-R-workshop/1-folders-spreadsheets-organisingData"
setwd()
sessionInfo()
## R version 3.5.0 (2018-04-23) ## Platform: x86_64-apple-darwin15.6.0 (64-bit) ## Running under: macOS High Sierra 10.13.5 ## ## Matrix products: default ## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib ## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib ## ## locale: ## [1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8 ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] DiagrammeR_1.0.0 ## ## loaded via a namespace (and not attached): ## [1] Rcpp_0.12.17 highr_0.6 pillar_1.2.1 ## [4] compiler_3.5.0 RColorBrewer_1.1-2 influenceR_0.1.0 ## [7] plyr_1.8.4 bindr_0.1.1 viridis_0.5.1 ## [10] tools_3.5.0 digest_0.6.15 jsonlite_1.5 ## [13] viridisLite_0.3.0 gtable_0.2.0 evaluate_0.10.1 ## [16] tibble_1.4.2 rgexf_0.15.3 pkgconfig_2.0.1 ## [19] rlang_0.2.1 igraph_1.2.1 rstudioapi_0.7 ## [22] yaml_2.1.19 bindrcpp_0.2.2 gridExtra_2.3 ## [25] downloader_0.4 dplyr_0.7.5 stringr_1.3.1 ## [28] knitr_1.20 htmlwidgets_1.2 hms_0.4.2 ## [31] grid_3.5.0 rprojroot_1.3-2 tidyselect_0.2.4 ## [34] glue_1.2.0.9000 R6_2.2.2 Rook_1.1-1 ## [37] XML_3.98-1.11 rmarkdown_1.10 ggplot2_2.2.1.9000 ## [40] tidyr_0.8.0 purrr_0.2.5 readr_1.1.1 ## [43] magrittr_1.5 backports_1.1.2 scales_0.5.0 ## [46] htmltools_0.3.6 assertthat_0.2.0 colorspace_1.3-2 ## [49] brew_1.0-6 stringi_1.2.3 visNetwork_2.0.3 ## [52] lazyeval_0.2.1 munsell_0.4.3
require(tidyverse)
## Loading required package: tidyverse
## ── Attaching packages ────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1.9000 ✔ purrr 0.2.5 ## ✔ tibble 1.4.2 ✔ dplyr 0.7.5 ## ✔ tidyr 0.8.0 ✔ stringr 1.3.1 ## ✔ readr 1.1.1 ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────────────────────── tidyverse_conflicts() ── ## ✖ dplyr::filter() masks stats::filter() ## ✖ dplyr::lag() masks stats::lag()
# install.packages("vegan")
# install.packages("skimr")
require(vegan)
require(skimr)
data("mite") # the data command only works for in-built data
data("mite.env") # we'll cover reading in your own data later on
head(mite, n = 6)
| Brachy | PHTH | HPAV | RARD | SSTR | Protopl | MEGR | MPRO | TVIE | HMIN | HMIN2 | NPRA | TVEL | ONOV | SUCT | LCIL | Oribatl1 | Ceratoz1 | PWIL | Galumna1 | Stgncrs2 | HRUF | Trhypch1 | PPEL | NCOR | SLAT | FSET | Lepidzts | Eupelops | Miniglmn | LRUG | PLAG2 | Ceratoz3 | Oppiminu | Trimalc2 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 17 | 5 | 5 | 3 | 2 | 1 | 4 | 2 | 2 | 1 | 4 | 1 | 17 | 4 | 9 | 50 | 3 | 1 | 1 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 7 | 16 | 0 | 6 | 0 | 4 | 2 | 0 | 0 | 1 | 3 | 21 | 27 | 12 | 138 | 6 | 0 | 1 | 3 | 9 | 1 | 1 | 1 | 2 | 2 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 3 | 1 | 1 | 2 | 0 | 3 | 0 | 0 | 0 | 6 | 3 | 20 | 17 | 10 | 89 | 3 | 0 | 2 | 1 | 8 | 0 | 3 | 0 | 2 | 0 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 23 | 7 | 10 | 2 | 2 | 0 | 4 | 0 | 1 | 2 | 10 | 0 | 18 | 47 | 17 | 108 | 10 | 1 | 0 | 1 | 2 | 1 | 2 | 1 | 3 | 2 | 12 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | 8 | 13 | 9 | 0 | 13 | 0 | 0 | 0 | 3 | 14 | 3 | 32 | 43 | 27 | 5 | 1 | 0 | 5 | 2 | 1 | 0 | 1 | 0 | 0 | 0 | 12 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 19 | 7 | 5 | 9 | 3 | 2 | 3 | 0 | 0 | 20 | 16 | 2 | 13 | 38 | 39 | 3 | 5 | 0 | 1 | 1 | 8 | 0 | 4 | 0 | 1 | 0 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
skim(mite)
## Skim summary statistics ## n obs: 70 ## n variables: 35 ## ## ── Variable type:integer ────────────────────────────────────────────────────────────────── ## variable missing complete n mean sd p0 p25 p50 p75 p100 hist ## Brachy 0 70 70 8.73 10.08 0 3 4.5 11.75 42 ▇▂▁▂▁▁▁▁ ## Ceratoz1 0 70 70 1.29 1.46 0 0 1 2 5 ▇▆▁▃▁▁▁▁ ## Ceratoz3 0 70 70 1.3 2.2 0 0 0 2 9 ▇▁▁▁▁▁▁▁ ## Eupelops 0 70 70 0.64 0.99 0 0 0 1 4 ▇▃▁▁▁▁▁▁ ## FSET 0 70 70 1.86 3.18 0 0 0 2 12 ▇▂▁▁▁▁▁▁ ## Galumna1 0 70 70 0.96 1.73 0 0 0 1 8 ▇▁▁▁▁▁▁▁ ## HMIN 0 70 70 4.91 8.47 0 0 0 4.75 36 ▇▁▁▁▁▁▁▁ ## HMIN2 0 70 70 1.96 3.92 0 0 0 2.75 20 ▇▂▁▁▁▁▁▁ ## HPAV 0 70 70 8.51 7.56 0 4 6.5 12 37 ▇▇▃▃▁▁▁▁ ## HRUF 0 70 70 0.23 0.62 0 0 0 0 3 ▇▁▁▁▁▁▁▁ ## LCIL 0 70 70 35.26 88.85 0 1.25 13 44 723 ▇▁▁▁▁▁▁▁ ## Lepidzts 0 70 70 0.17 0.54 0 0 0 0 3 ▇▁▁▁▁▁▁▁ ## LRUG 0 70 70 10.43 12.66 0 0 4.5 17.75 57 ▇▂▂▁▁▁▁▁ ## MEGR 0 70 70 2.19 3.62 0 0 1 3 17 ▇▂▁▁▁▁▁▁ ## Miniglmn 0 70 70 0.24 0.79 0 0 0 0 5 ▇▁▁▁▁▁▁▁ ## MPRO 0 70 70 0.16 0.47 0 0 0 0 2 ▇▁▁▁▁▁▁▁ ## NCOR 0 70 70 1.13 1.65 0 0 0.5 1.75 7 ▇▃▂▂▁▁▁▁ ## NPRA 0 70 70 1.89 2.37 0 0 1 2.75 10 ▇▂▂▁▁▁▁▁ ## ONOV 0 70 70 17.27 18.05 0 5 10.5 24.25 73 ▇▃▂▁▁▁▁▁ ## Oppiminu 0 70 70 1.11 1.84 0 0 0 1.75 9 ▇▁▁▁▁▁▁▁ ## Oribatl1 0 70 70 1.89 3.43 0 0 0 2.75 17 ▇▁▁▁▁▁▁▁ ## PHTH 0 70 70 1.27 2.17 0 0 0 2 8 ▇▁▁▁▁▁▁▁ ## PLAG2 0 70 70 0.8 1.79 0 0 0 1 9 ▇▁▁▁▁▁▁▁ ## PPEL 0 70 70 0.17 0.54 0 0 0 0 3 ▇▁▁▁▁▁▁▁ ## Protopl 0 70 70 0.37 1.61 0 0 0 0 13 ▇▁▁▁▁▁▁▁ ## PWIL 0 70 70 1.09 1.71 0 0 0 1 8 ▇▁▁▁▁▁▁▁ ## RARD 0 70 70 1.21 2.78 0 0 0 1 13 ▇▂▁▁▁▁▁▁ ## SLAT 0 70 70 0.4 1.23 0 0 0 0 8 ▇▁▁▁▁▁▁▁ ## SSTR 0 70 70 0.31 0.97 0 0 0 0 6 ▇▁▁▁▁▁▁▁ ## Stgncrs2 0 70 70 0.73 1.83 0 0 0 0 9 ▇▁▁▁▁▁▁▁ ## SUCT 0 70 70 16.96 13.89 0 7.25 13.5 24 63 ▇▇▆▅▂▁▁▁ ## Trhypch1 0 70 70 2.61 6.14 0 0 0 2 29 ▇▁▁▁▁▁▁▁ ## Trimalc2 0 70 70 2.07 5.79 0 0 0 0 33 ▇▁▁▁▁▁▁▁ ## TVEL 0 70 70 9.06 10.93 0 0 3 19 42 ▇▁▁▂▁▁▁▁ ## TVIE 0 70 70 0.83 1.47 0 0 0 1 7 ▇▁▁▁▁▁▁▁
Select chooses certain columns - to keep, or to get rid of. The format is
DATANAME %>% select(col1, col2, col3)
mite %>% select(Brachy, PHTH, HPAV)
## Brachy PHTH HPAV ## 1 17 5 5 ## 2 2 7 16 ## 3 4 3 1 ## 4 23 7 10 ## 5 5 8 13 ## 6 19 7 5 ## 7 17 3 8 ## 8 5 4 8 ## 9 3 3 2 ## 10 22 4 5 ## 11 36 7 35 ## 12 28 2 12 ## 13 3 2 4 ## 14 41 5 12 ## 15 6 0 6 ## 16 7 2 3 ## 17 9 0 1 ## 18 19 3 7 ## 19 12 2 10 ## 20 3 1 7 ## 21 5 2 8 ## 22 4 0 4 ## 23 19 0 8 ## 24 4 0 1 ## 25 12 4 15 ## 26 6 0 4 ## 27 4 4 4 ## 28 9 0 4 ## 29 42 0 6 ## 30 20 1 2 ## 31 12 0 5 ## 32 4 0 9 ## 33 38 0 17 ## 34 5 0 14 ## 35 3 0 0 ## 36 3 1 2 ## 37 3 0 5 ## 38 8 0 6 ## 39 0 0 0 ## 40 1 0 31 ## 41 2 0 10 ## 42 0 0 12 ## 43 5 0 2 ## 44 0 0 2 ## 45 11 0 8 ## 46 4 0 4 ## 47 0 0 8 ## 48 0 0 3 ## 49 10 0 14 ## 50 4 0 37 ## 51 2 0 5 ## 52 3 0 4 ## 53 3 0 17 ## 54 2 0 7 ## 55 1 0 3 ## 56 1 0 16 ## 57 0 0 0 ## 58 0 0 12 ## 59 1 0 0 ## 60 1 0 16 ## 61 6 0 9 ## 62 3 0 5 ## 63 19 0 3 ## 64 3 0 16 ## 65 4 0 10 ## 66 8 0 18 ## 67 4 0 3 ## 68 6 0 22 ## 69 20 2 4 ## 70 5 0 11
names(mite)
## [1] "Brachy" "PHTH" "HPAV" "RARD" "SSTR" "Protopl" ## [7] "MEGR" "MPRO" "TVIE" "HMIN" "HMIN2" "NPRA" ## [13] "TVEL" "ONOV" "SUCT" "LCIL" "Oribatl1" "Ceratoz1" ## [19] "PWIL" "Galumna1" "Stgncrs2" "HRUF" "Trhypch1" "PPEL" ## [25] "NCOR" "SLAT" "FSET" "Lepidzts" "Eupelops" "Miniglmn" ## [31] "LRUG" "PLAG2" "Ceratoz3" "Oppiminu" "Trimalc2"
mite %>% select(-c(PHTH:Oppiminu))
## Brachy Trimalc2 ## 1 17 0 ## 2 2 0 ## 3 4 0 ## 4 23 0 ## 5 5 0 ## 6 19 0 ## 7 17 0 ## 8 5 0 ## 9 3 0 ## 10 22 0 ## 11 36 0 ## 12 28 0 ## 13 3 0 ## 14 41 0 ## 15 6 0 ## 16 7 0 ## 17 9 0 ## 18 19 0 ## 19 12 0 ## 20 3 0 ## 21 5 0 ## 22 4 0 ## 23 19 0 ## 24 4 0 ## 25 12 0 ## 26 6 0 ## 27 4 0 ## 28 9 0 ## 29 42 0 ## 30 20 0 ## 31 12 0 ## 32 4 0 ## 33 38 0 ## 34 5 0 ## 35 3 0 ## 36 3 0 ## 37 3 0 ## 38 8 0 ## 39 0 0 ## 40 1 0 ## 41 2 0 ## 42 0 0 ## 43 5 1 ## 44 0 0 ## 45 11 0 ## 46 4 0 ## 47 0 0 ## 48 0 1 ## 49 10 0 ## 50 4 0 ## 51 2 1 ## 52 3 0 ## 53 3 9 ## 54 2 1 ## 55 1 0 ## 56 1 5 ## 57 0 0 ## 58 0 0 ## 59 1 1 ## 60 1 1 ## 61 6 5 ## 62 3 0 ## 63 19 8 ## 64 3 11 ## 65 4 25 ## 66 8 9 ## 67 4 33 ## 68 6 17 ## 69 20 3 ## 70 5 14
Chaining allows us to write code in the order we want it done. Otherwise, it must be wrapped in brackets with the first thing to be done right in the middle.
mite.env %>% select(Shrub, Topo)
means, take the mite.env dataframe, and then select the columns Shrub and Topo.
Filter selects rows in your dataframe, based on the conditions you specify. Same format as for select():
filter(DATA, CONDITION1)
filter(DATA, CONDITION1 & CONDITION2)
filter(DATA, CONDITION1 | CONDITION2)
mite.env %>% filter(WatrCont > 650)
## SubsDens WatrCont Substrate Shrub Topo ## 1 64.75 691.79 Sphagn2 Few Blanket ## 2 62.38 708.16 Barepeat Few Blanket ## 3 52.73 656.35 Sphagn1 None Blanket ## 4 52.12 826.96 Sphagn1 None Blanket
Here, Shrub has to equal "Few". If you want to select two values (such as two sites) see the next slide.
mite.env %>% filter(WatrCont > 650 & Shrub == "Few")
## SubsDens WatrCont Substrate Shrub Topo ## 1 64.75 691.79 Sphagn2 Few Blanket ## 2 62.38 708.16 Barepeat Few Blanket
unique(mite.env$Substrate)
## [1] Sphagn1 Litter Interface Sphagn3 Sphagn4 Sphagn2 Barepeat ## Levels: Sphagn1 Sphagn2 Sphagn3 Sphagn4 Litter Barepeat Interface
mite.env %>% filter(Substrate %in% c("Litter", "Barepeat", "Interface"))
## SubsDens WatrCont Substrate Shrub Topo ## 1 54.99 434.81 Litter Few Hummock ## 2 46.07 371.72 Interface Few Hummock ## 3 80.59 266.78 Interface Many Blanket ## 4 61.43 310.70 Litter Many Blanket ## 5 37.25 239.51 Interface Many Blanket ## 6 59.93 350.64 Interface Many Blanket ## 7 35.41 321.87 Interface Few Hummock ## 8 29.56 296.95 Interface Many Hummock ## 9 44.10 383.83 Interface Many Blanket ## 10 38.61 145.68 Interface Many Hummock ## 11 32.27 291.59 Interface Many Hummock ## 12 35.30 293.49 Interface Many Blanket ## 13 32.86 323.12 Interface Many Hummock ## 14 37.33 284.27 Interface Many Blanket ## 15 53.17 367.11 Interface Many Blanket ## 16 34.76 393.62 Interface Few Blanket ## 17 47.74 528.44 Interface Few Blanket ## 18 34.26 398.20 Interface Few Blanket ## 19 26.60 386.37 Interface Few Blanket ## 20 56.65 581.00 Interface Few Blanket ## 21 62.38 708.16 Barepeat Few Blanket ## 22 46.81 538.51 Interface Few Blanket ## 23 33.98 323.96 Interface Few Blanket ## 24 28.29 434.28 Interface None Blanket ## 25 26.83 414.65 Interface None Blanket ## 26 31.98 447.65 Interface None Blanket ## 27 41.38 532.88 Interface None Blanket ## 28 56.82 613.39 Barepeat None Blanket ## 29 47.03 626.36 Interface None Blanket ## 30 48.59 634.75 Interface None Blanket ## 31 35.03 482.27 Interface None Blanket
ggplot which comes from the ggplot2 package.ggplot() just initiates the plotgeom_histogram() part.ggplot(data = DATAFRAME, aes(x = COLUMN_FOR_HISTOGRAM)) + geom_histogram()
ggplot(data = mite.env, aes(x = SubsDens)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(data = mite.env, aes(x = SubsDens)) + geom_histogram(binwidth = 10)
Actually, it should be real, but it should also be tidy
getwd()
setwd()
aMess <- read.csv("data/messyDataExample.csv")
head(aMess)
## Ashburton.Lakes.weight.of.vegetation.harvest.subsample X X.1 ## 1 ## 2 Date: ## 3 Lab team: ## 4 ## 5 ## 6 Wet weight ## X.2 X.3 X.4 X.5 X.6 ## 1 ## 2 ## 3 ## 4 ## 5 ## 6 Wet weight 1 Wet weight 2 Wet weight 3 Average wet weight (g) ## X.7 X.8 X.9 X.10 ## 1 ## 2 ## 3 ## 4 ## 5 ## 6 Dry weight Dry weight 1 Dry weight 2 Dry weight 3 ## X.11 X.12 X.13 X.14 X.15 X.16 X.17 X.18 X.19 X.20 X.21 ## 1 NA ## 2 NA ## 3 NA ## 4 NA ## 5 NA ## 6 Average dry weight (g) NA ## X.22 X.23 X.24 X.25 X.26 X.27 X.28 X.29 X.30 X.31 X.32 ## 1 NA ## 2 NA ## 3 NA ## 4 NA ## 5 NA ## 6 NA
To see the whole thing: View(aMess)
Compare the output of names(aMess) and names(mite.env)
[group task]
What did we call it (ie the filename)?!
Then we can read it back in